library(tidyverse)
library(piano)
library(edgeR)
Set wd
Load lm results from transcripts_lm
lm_res = readRDS("../github/results/linear_models/transcripts/lm_res_symbols.rds")
head(lm_res)
Load gene sets
kegg_gsc = loadGSC(file="gene_sets/c2.cp.kegg.v7.2.symbols.gmt")
hallmarks_gsc = loadGSC(file="gene_sets/h.all.v7.2.symbols.gmt")
FDR correction
lm_res %>%
filter(term == "spliceosome_mutatedYES") %>%
mutate(fdr=p.adjust(p.value, method = "fdr")) %>%
arrange(p.value) %>%
head()
Extract FDR adjusted p-values and logFC (estimate)
cat("p-values \n")
p-values
cat("p-values \n")
p-values
head(p_adj)
PSMD2 SRRM2 COPA NDUFS7 TAB2 PABPC1
0.01134224 0.01134224 0.01134224 0.01134224 0.01134224 0.01417218
cat("\nlog2-FC \n")
log2-FC
head(log2fc)
PSMD2 SRRM2 COPA NDUFS7 TAB2 PABPC1
1.6451148 1.0252466 0.8912047 0.8341501 0.6651474 1.3064701
Run PIANO For kegg gs.
gsa_res
Final gene/gene-set association: 3082 genes and 50 gene-sets
Details:
Calculating gene set statistics from 3082 out of 36228 gene-level statistics
Removed 1301 genes from GSC due to lack of matching gene statistics
Removed 0 gene sets containing no genes after gene removal
Removed additionally 0 gene sets not matching the size limits
Loaded additional information for 50 gene sets
Gene statistic type: p-like
Method: reporter
Gene-set statistic name: Z
Significance: Theoretical null distribution
Adjustment: fdr
Gene set size limit: (1,Inf)
Permutations: 10000
Total run time: 1.5 min
GSAheatmap(gsaRes = gsa_res)
Run PIANO on KEGG gs, using non-adj p-values
gsa_adj_kegg <- runGSA(p_val,
log2fc,
gsc = kegg_gsc,
ncpus=8,
geneSetStat = "reporter",
signifMethod = "nullDist",
nPerm = 10000)
Running gene set analysis:
Checking arguments...
Found duplicates in rownames(geneLevelStats), all values will be used for calculation of gene set statistics. It is recommended to avoid this and handle duplicates prior to running runGSA. In particular, the GSEA and FGSEA implementations will give different results!
done!
Final gene/gene-set association: 2984 genes and 186 gene sets
Details:
Calculating gene set statistics from 2984 out of 36228 gene-level statistics
Removed 2261 genes from GSC due to lack of matching gene statistics
Removed 0 gene sets containing no genes after gene removal
Removed additionally 0 gene sets not matching the size limits
Loaded additional information for 186 gene sets
Calculating gene set statistics...done!
Calculating gene set significance...done!
Adjusting for multiple testing...done!
Run PIANO on KEGG gs, using gene sampling to assess significance
gsa_kegg_geneS <- runGSA(p_val,
log2fc,
gsc = kegg_gsc,
ncpus=8,
geneSetStat = "reporter",
signifMethod = "geneSampling",
nPerm = 10000)
Running gene set analysis:
Checking arguments...
Found duplicates in rownames(geneLevelStats), all values will be used for calculation of gene set statistics. It is recommended to avoid this and handle duplicates prior to running runGSA. In particular, the GSEA and FGSEA implementations will give different results!
done!
Final gene/gene-set association: 2984 genes and 186 gene sets
Details:
Calculating gene set statistics from 2984 out of 36228 gene-level statistics
Using all 36228 gene-level statistics for significance estimation
Removed 2261 genes from GSC due to lack of matching gene statistics
Removed 0 gene sets containing no genes after gene removal
Removed additionally 0 gene sets not matching the size limits
Loaded additional information for 186 gene sets
Calculating gene set statistics...done!
Calculating gene set significance...done!
Adjusting for multiple testing...done!
Run PIANO on hallmarks gsc, theoretical null distribution.
gsa_res <- runGSA(p_adj,
log2fc,
gsc = hallmarks_gsc,
ncpus=8,
geneSetStat = "reporter",
signifMethod = "nullDist",
nPerm = 10000)
Running gene set analysis:
Checking arguments...
Found duplicates in rownames(geneLevelStats), all values will be used for calculation of gene set statistics. It is recommended to avoid this and handle duplicates prior to running runGSA. In particular, the GSEA and FGSEA implementations will give different results!
done!
Final gene/gene-set association: 3082 genes and 50 gene sets
Details:
Calculating gene set statistics from 3082 out of 36228 gene-level statistics
Removed 1301 genes from GSC due to lack of matching gene statistics
Removed 0 gene sets containing no genes after gene removal
Removed additionally 0 gene sets not matching the size limits
Loaded additional information for 50 gene sets
Calculating gene set statistics...done!
Calculating gene set significance...done!
Adjusting for multiple testing...done!
GSAheatmap(gsaRes = gsa_res)
Run PIANO on hallmarks gsc, gene sampling.
gsa_res <- runGSA(p_adj,
log2fc,
gsc = hallmarks_gsc,
ncpus=8,
geneSetStat = "reporter",
signifMethod = "geneSampling",
nPerm = 10000)
Running gene set analysis:
Checking arguments...
Found duplicates in rownames(geneLevelStats), all values will be used for calculation of gene set statistics. It is recommended to avoid this and handle duplicates prior to running runGSA. In particular, the GSEA and FGSEA implementations will give different results!
done!
Final gene/gene-set association: 3082 genes and 50 gene sets
Details:
Calculating gene set statistics from 3082 out of 36228 gene-level statistics
Using all 36228 gene-level statistics for significance estimation
Removed 1301 genes from GSC due to lack of matching gene statistics
Removed 0 gene sets containing no genes after gene removal
Removed additionally 0 gene sets not matching the size limits
Loaded additional information for 50 gene sets
Calculating gene set statistics...done!
Calculating gene set significance...done!
Adjusting for multiple testing...done!
GSAheatmap(gsaRes = gsa_res)
Extract matrix
Produce tidy heatmap